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Steganography is the process of hiding secret information by embedding it in 
an "innocent" message. We present protocols for hiding quantum information 
in a codeword of a quantum error-correcting code passing through a channel. 
Using either a shared classical secret key or shared entanglement the sender 
(Alice) disguises her information as errors in the channel. The receiver (Bob) 
can retrieve the hidden information, but an eavesdropper (Eve) with the power 
to monitor the channel, but without the secret key, cannot distinguish the mes- 
sage from channel noise. We analyze how difficult it is for Eve to detect the 
presence of secret messages, and estimate rates of steganographic communica- 
tion and secret key consumption for certain protocols. 

Steganography is the science of hiding a message within a larger innocent-looking plain-text 
message, and communicating the resulting data over a communications channel or by a courier 
so that the steganographic message is readable only by the intended receiver. The word comes 
from the Greek words steganos which means "covered," and graphia which means "writing." 
The art of information hiding dates back to 440 B.C. to the Greeks (i). The term steganography 
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was first used in 1499 by Johannes Trithemius in his Steganographia, which was one of the first 
treatises on the use of cryptographic and steganographic techniques (2). 

The modem study of steganography was initiated by Simmons and the paradigm can be 
stated as follows (5). Alice and Bob are imprisoned in two different cells that are far apart. 
They would like to devise an escape plan, but the only way they can communicate with each 
other is through a courier who is under the command of the warden (Eve, the adversary) of the 
penitentiary. The courier leaks all information to the warden. If the warden suspects that either 
Alice or Bob are conspiring to escape from the penitentiary, she will cut off all communication 
between them, and move both of them to a maximum security cell. Prior to their incarceration 
Alice and Bob had access to a shared secret key — assumed to be a sufficiently long string of 
random bits — which they later exploit to send secret messages hidden in a cover text. Can Alice 
and Bob devise an escape plan without arousing the suspicion of the warden? 

Julio Gea-Banacloche (4) introduced the idea of hiding secret messages in the form of error 
syndromes by deliberately applying correctable errors to a quantum state encoded in the three- 
bit repetition quantum error-correcting code (QECC). In his paper, however, he did not address 
the issue of an innocent-looking message — in the protocol he proposed, the messages would 
not resemble a plausible quantum channel. The latter is one of the major contributions of our 
work. Curty et. al. propose three different quantum steganographic protocols (5). However, 
none of these protocols address the issue of communicating an innocent message over a noisy 
classical channel or a general quantum channel, or give key-consumption rates. Natori provides 
a rudimentary treatment of quantum steganography which is a modification of super-dense cod- 
ing (6). Martin introduced a notion of quantum steganographic communication in (7). His 
protocol is a variation of Bennett and Brassard's quantum-key distribution protocol (QKD), in 
which he hides a steganographic channel in the QKD protocol. 

Our treatment of quantum steganography is more general than those above. We provide a 
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protocol (Fig. 1) for hiding quantum information using typical sequences of errors for general 
quantum channels. We begin by showing how quantum information can be hidden in the noise 
of a depolarizing channel, using a shared classical secret key between Alice and Bob. In our 
first quantum steganographic protocol the channel is intrinsically noiseless (i.e., all noise is con- 
trolled by Alice), and in the second case the channel has its own intrinsic noise (not controlled 
by Alice and Bob). We calculate the amount of secret key consumed. We later present a quan- 
tum steganographic protocol for general quantum channels. We also discuss whether Alice and 
Bob can send a finite amount of hidden information, or can actually communicate at a nonzero 
asymptotic rate (given an arbitrarily large secret key). This depends on Eve's knowledge of the 
physical channel, and Alice and Bob's knowledge of Eve's expectations. Finally, we address 
the question of security. This is two-fold: first, can Eve detect that a secret message has been 
sent? And second, can she read the message? 

The quantum analog of the classical binary symmetric channel (BSC) is the depolarizing 
channel (DC) which is one of the most widely used quantum channel models: 

p^J\fp^(l-p)p + IxpX + ^-YpY + ^-ZpZ . (1) 

That is, each qubit has an equal probability of undergoing an X, Y, or Z error. Applying this 
channel repeatedly to a qubit will map it eventually to the maximally mixed state 1/2. We can 
rewrite this channel in a different but equivalent form: 

M={1- 4p/3)X + (4p/3)r . (2) 

where Xp — p and Tp — (1/4) (p + XpX + YpY + ZpZ) . The operation T is twirling: it takes 
a qubit in any state p to the maximally mixed state 1/2. If we rewrite the channel in this way, 
instead of applying X, Y, or Z errors with probability p/3, we can think of removing the qubit 
with probability 4p/3, and replacing it with a maximally mixed state. This picture makes the 
steganographic protocol more transparent. We will first assume that the actual physical channel 
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between Alice and Bob is noiseless. All the noise that Eve sees is due to deliberate errors that 
Alice applies to her codewords. 

1. Alice encodes a covertext of qubits into qubits with an [[N, kc\] quantum error- 
correcting code (QECC). 

2. From the DC would maximally mix Q qubits with probability pg where 



For large A^, Alice can send M = (4/3)pA^(l — 6) stego qubits, where 1^6^ 
^{l-Ap/3)/{Ap/3)N. (The chance of fewer than M errors is negligibly small.) 

3. Using the shared random key (or shared ebits), Alice chooses a random subset of M qubits 
out of the A^, and swaps her M stego qubits for those qubits of the codeword. She also 
replaces a random number m of qubits outside this subset with maximally mixed qubits, 
so that the total Q = M + m matches the binomial distribution (|3]) to high accuracy. 

4. Alice "twirls" her M stego qubits using 2M bits of secret key or 2M shared ebits. To 
each qubit she applies one of /, X, Y, or Z chosen at random, so p — t- Tp. To Eve, who 
does not have the key, these qubits appear maximally mixed. (Twirling can be thought of 
as the quantum equivalent of a one-time pad.) 

5. Alice transmits the codeword to Bob. From the secret key, he knows the correct subset of 
M qubits, and the one-time pad to decode them. 

This protocol transmits {A/3)pN{l — S) secret qubits from Alice to Bob (Fig. 2). 

If the channel contains intrinsic noise, Alice will first have to encode her stego qubits in 
an [[M, kg]] QECC, swap those M qubits for a random subset of M qubits in the codeword, 
and apply the twirling procedure. This twirling does not interfere with the error-correcting 




(3) 
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power of the QECC if Bob knows the key. Assuming the physical channel is also a DC with 
error rate p, and that Alice emulates a DC with error rate g, the effective channel will appear 
to Eve like a DC with error rate p + g(l — 4p/3) = p -\- 5p. The rate of transmission kg/N 
will depend on the rate of the QECC used to protect the stego qubits. For a BSC this would be 
(1 — 5){1 — h{p))5p/{l — 2p). However, for most quantum channels (including the DC) the 
achievable rate is not known. 

The secret key is used at two points in these protocols. First, in step 3 Alice chooses a 
random subset of M qubits out of the A^-qubit codeword. There are C{N,M) subsets, so 
roughly logg C{N, M) bits are needed to choose one. Next, in step 4, 2M bits of key are used 
for twirling. This gives us 



bits of secret key used. Define the key consumption rate K, — Uk/N to he the number of bits 

of key consumed per qubit that Alice sends through the channel. We use M a; AqN/S and 
q fti 5p/ (1 — 4p/3) to express JC in terms of p, 5p, and N (Fig. 3): 



Alice can consume fewer bits of key if Bob and she have access to a source that averages to a 
maximally mixed state. This would allow them to bypass the twirling procedure. The protocols 
given above perform well in emulating a depolarizing channel. However, there are far more 
general channels than these, and the protocols may not work well, or at all, in these cases. If 
one has a channel that can be written 



where S is an arbitrary error operation, one can still use the above protocols to hide approxi- 
mately ptN stego bits or qubits, while generating peN random errors of type £. But for some 




(4) 



/C«log2[(4//3)^(l-/3iVf-^] , /3 = 45p/(3-4p). 



(5) 



p Up = (1 - pr + Pe)1'P + PtTp + Pe^P 



(6) 
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channels, px niay be very small or zero. How should we proceed? Moreover, hiding stego 
qubits locally as apparently maximally-mixed qubits sacrifices some potential information. The 
location of the error — that is, the choice of the subset holding the errors — could also be used 
to convey information, potentially increasing the rate and reducing the amount of secret key or 
shared entanglement required. 

A different approach is instead to encode information in the error syndromes. For simplicity, 
we consider the case when N is large. In this case, it suffices to consider only typical errors. 
We begin with the case where the physical channel is noise-free. 

For large N, almost all (probability 1 — e) combinations of errors on the individual qubits 
will correspond to one of the set of typical errors. There are roughly 2*^ of these, and their 
probabilities Pe are all bounded within a range 2"^^*+'') < Pe < 2"^^*"''). The number s is the 
entropy of the channel on one qubit; for the BSC s — h{p) — —plog2P — (1 — p) log2(l — p), 
and for the DC s — — (1 — p) log2(l — p) — plog2p/3. We label the typical error operators 
£^0, El, . . . , E2sN_i, and their corresponding probabilities are pj. A good choice of QECC for 
the cover text will be able to correct all these errors. We make the simplifying assumption that 
the QECC is nondegenerate, so each typical error Ej has a distinct error syndrome labelled sj. 

Ahead of time, Alice and Bob partition the typical errors into C roughly equiprobable sets 
Sk, so that 



As far as possible, the errors in a given set should be chosen to have roughly equal probabilities. 
The maximum of C is roughly C fa 2^(*~''), and k — 0, . . . ,C — 1. We can now present a new 
quantum steganographic protocol, using error syndromes to store information. 

1. Alice prepares kc qubits of cover text in a state \tjjc). 




(7) 
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2. Alice's secret message is a string of log2 C ^ N{s — 6) qubits, in a state 

c-i 

\^s) = J2ak\k). (8) 

A:=0 

She "twirls" each qubit of this string, using 2N(s — 5) bits of the secret key or shared 
ebits, to get a maximally mixed state. To this, she appends N — kc — (s — 5)N extra 
ancilla qubits in the state |0) to make up a total register of N — kc qubits. 

3. Using the shared secret key, Alice chooses from each set Sk a typical error Ej^ with 
syndrome Sj^.. She applies a unitary Us to the register of N — kc qubits, that maps 
Us (^\k) iQ^'^^-'^c-sAf j _ sj^e appends this register to the cover qubits in state 
l-ipc), then applies the encoding unitary Ue- Averaging over the secret key, the resulting 
state will appear to Eve like p ~ Z^j=o Pj^jl'^c/y^ciEj, which is effectively indistin- 
guishable from the channel being emulated acting on the encoded cover text. 

4. Alice sends this codeword to Bob. If Eve examines its syndrome, she will find a typical 
error for the channel being emulated. 

5. Bob applies the decoding unitary Ud = Ul^, and then applies Ug (which he knows using 
the shared secret key). He discards the cover text and the last N — k^, — sN ancilla qubits, 
and undoes the twirling operation on the remaining qubits, again using the secret key. If 
Eve has not measured the qubits, he will have recovered the state encoded by Alice (8). 

This protocol may easily be used to send classical information by using a single basis state rather 
than a superposition like ([8]). The steganographic transmission rate 7?. is roughly 7?. ~ s—5 — > s. 
The rate of transmission s is higher than the rate Ap/3 of our first protocol. This protocol used 
2N(s — 5) bits of secret key (or ebits) for twirling in step 2, and roughly N5 bits of secret key 
in choosing representative errors Ej^ from each set Sk in step 3. So the key rate is roughly 
/C ~ 2s — 5 — > 2s, better than the first protocol in key usage per stego qubit transmitted. Since 
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almost all the key usage goes to the twirling operation, for sources that are maximally mixed 
on average the rate of key usage can actually go to zero as iV ^ oo. However, this encoding is 
much trickier in the case where the channel contains intrinsic noise. 

In principle this quantum steganographic protocol can be used when the channel contains 
noise. The steganographic qubits are first encoded in a QECC to protect them against the noise 
in the channel. In practice, for many channels this can be difficult: the effects of errors on the 
space of syndromes look quite different from a usual additive error channel. Also, unlike the de- 
polarizing channel, general channels when composed together may change their type. However, 
by drawing on codes with suitable properties, the problem of designing steganographic proto- 
cols for general channels may be simplified. We discuss a simple example in the supporting 
online material (SOM), but the solution for a general channel is a problem for future work. 

What is the standard of security for a stego protocol? There are two obvious considerations. 
First, if Eve becomes suspicious, can she read the message? At the cost of using one-time pads 
or twirling, Alice and Bob can prevent this from happening. 

The more important question is, can Alice and Bob avoid arousing Eve's suspicions in the 
first place? To do this, the messages that Alice sends must emulate as closely as possible the 
channel that Eve expects. We can make this condition quantitative. Let £c be the channel on 
qubits that Eve expects, and let Es be the effective channel that Alice and Bob produce with their 
steganographic protocol. Then the protocol is secure if Eg is e-close to Ec in the diamond norm 
11^5 — ^cllo ^ ^ for some small e > 0. The diamond norm is directly related to the probability 
for Eve to distinguish Ec from Es under ideal circumstances (i.e., when she controls both inputs 
and outputs), and so puts an upper bound on her ability to distinguish them in practice. 

For a simple example, the difference between two DCs applied to N qubits has norm 

/ AT \ 

||A/'r-Ar,^^IL = E i \r^{l-rr-^-p>{l-pr-^\, (9) 

3=0 V / 
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where p is the error-rate of the channel Eve expects and r — p -\- 5p the error-rate of the 
steganographic channel that emulates Eve's expected channel. If we make 5p < ey^p(l — p)/N 
then we can make this norm as small as we like, while communicating 0{SpN) — 0{ey/N) 
secret qubits. This indicates that even if Eve has exact knowledge of the channel, Alice and 
Bob can in principle send an arbitrarily large (but finite) amount of information without arous- 
ing Eve's suspicion, by choosing a sufficiently small 5p and large N (8). If Eve's knowledge 
of the channel is imperfect, Alice and Bob can do even better, communicating steganographic 
information at a nonzero rate. If Eve is constantly monitoring the channel over a long period 
of time, and if she has exact knowledge of the channel then she will eventually learn that Al- 
ice and Bob are communicating with each other steganographically. Moreover, with constant 
measurement Eve can disrupt the superpositions of the steganographic qubits and prevent any 
information from ever reaching Bob, effectively flooding the quantum channel with noise. 

If Alice and Bob have shared ebits, they can perform measurements on each of their halves 
and distill correlated random bits. Moreover, with shared ebits Alice can send her quantum 
information to Bob via quantum teleportation by sending only classical bits through the channel. 
These classical bits are the result of her measurement on her half of the ebits and her stego 
qubits. To Eve who may be monitoring the channel, these bits will look maximally mixed 
(random). For her to change the outcome of what Bob receives on his end. Eve would have to 
disrupt the bits. So if Eve is measuring the channel continuously, Alice and Bob can still send 
quantum information to each other using their shared ebits. 
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Figure 1 : There are three different inputs to the steganographic encoder £ : a cover-message | C) ; the 
secret message that we would like to hide, which can be quantum |5) or classical S; a shared secret 
key which may be quantum (ebit) |/C) or classical JC. Eve can monitor some part of the noisy quantum 
channel M shown in the red box. Bob can decode the steganographic message using the decoder V and 
the shared secret key |/C) or /C and recover |C), and \S) or S with very high probability. 
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Information qubit 




Figure 2: Alice hides her information qubit (solid brown circle) by swapping it in with a qubit of her 
quantum codeword. She uses her shared secret key with Bob to determine which qubit to swap. She uses 
the shared key again to twirl the information qubit. She further applies random depolarizing errors to 
the rest of the qubits of the codeword (shown in green). She sends the codeword through a depolarizing 
channel to Bob who uses the shared secret to correctly apply the untwirling operation, followed by 
locating and swapping out Alice's original information qubit. 
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Key Consumption Rate vs. Error Rate 
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Figure 3: We plot the key consumption rate (KCR) as a function of the error-rate p of the channel. 
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Supporting Online Material 

We assume a basic knowledge of quantum information science at the level of (i). In this section 
we gather the definition of the diamond norm and some of its relevant properties to derive the 
norm of the difference between N uses of two binary-symmetric channels (BSC) and two de- 
polarizing channels (DC). We refer the reader to John Watrous's lecture notes for the definition 
and properties of the diamond norm (2). As mentioned in the main text the diamond norm give 
us a measure of how "close" or similar two channels can be when they transform an arbitrary 
density matrix from one Hilbert space to another. More formally let J\f be some arbitrary super- 
operator, and let A/" : L (V) L (W), where L (.) is a space of linear operators on the Hilbert 
spaces V and W. Then one can define the diamond-norm of N" as: 

m. ^ \\hiv) ® ^fL ' (SI) 

where is defined as: 

||A/-||,,^max{||A/-(0)||,^:OeL(V),||0||,, = l} . (S2) 

The maximization in (S2) is over all density matrices. When the Hilbert space is infinite dimen- 
sional we take the supremum of the set defined in (S2). 

Binary Symmetric Channel 

Let < p < 1/2 be the rate at which Alice flips the qubits of her codeword. Let r = p + Sp 
be the rate at which the BSC flips qubits, where 5p is some additional noise which is not under 
the control of either Alice or Bob. We assume that < p < r < 1/2 because at p = 1/2 the 
channel has zero capacity to send information and p > 1/2 means that more qubits are being 
flipped which is unnatural for this channel. For a single qubit (A^ = 1) let N'p be the BSC that 
Alice applies to an arbitrary single-qubit density operator p: 

MpP={l-p)p + pXpX , (S3) 
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and let Nr be the actual BSC 

MrP = (1 - r)p + rXpX . (S4) 
We can now express the difference of the two channels as: 

{Mr-Mp)p={p-r)p+{r-p)XpX (S5) 
We can express the diamond norm of the difference of the channels Mp and Mr as: 

\\Ur-Np\l = max II (/ ® {Af,-Up))p\l^ (S6) 

= (r - p)max || (/ ® I)p{I ® I) - {I ® X)p{I ® X) \\^^ . (S7) 
p 

When we substitute p = |0) (0| (V^ is some arbitrary density operator) in the above equation 
we achieve the maximum. 

\Wr - KWo -{r-p) IIV' ® |0)(0| - ^ ® |1)(1|||,, (S8) 
<IIV'®|0)(0|||,, + |-1|||^®|1)(1|||,, (S9) 

= (r-p)II^LII|o)(o|||,, + ||^||,j||i)(i|||,, (sio) 

= (r-p)(l + l) (Sll) 

= 2(r-p) (S12) 

^2ip + 5p-p) (S13) 

= 25p. (S14) 

In (S9) we use the triangle inequality and in (SIO) we use the fact that for any two linear 
operators A and B, the trace norm of their tensor product is equal to the product of their trace 
norms, i.e., 11^4 (8) Sjj^^ = \\A\\^^ ll^lltr- We would like an expression for the optimal probability 
to correctly distinguish two channels. 

Popt = l + l\\K-KL ■ (S15) 
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So for a single-qubit use 

Popt = \{l + 5p). (S16) 
For the case where we have two qubits, we can write AUce's BSC as: 

{K ® K)P = (1 - PfP + P(l - P)XipX, + p(l - p)X2pX2 + p'X^X2pX,X2 , (S17) 

where Xi = X^I andX2 = I^X, andXiX2 = X^X. We can similarly calculate M<8'A/'i. 
We can now write the difference between the two channels as: 

{Nr ® Mr - Np ® Np)p ^ {r^ - 2r + 2p - p'^) 

+ {r-r^ -p + p^){XipXi+X2pX2) (818) 
+ {r^-p^)X^X2pX,X2. 

The diamond norm of the difference between two BSC on two qubits can be expressed as: 

\\^^r®K-^fp(^^fp\\^ = ^^ax\\{I ®{^^r^^fr-^fp^^fp))p\\^^ . (si9) 

We use a similar construction from the single-qubit case to maximize the right side of (S17). 
Letting p = ip ® |00)(00| in (S19), we get: 

\\K®K-J^p®K\\^ = \{l-rf -{l-pf\+2\r{l-r) - p{l - p)\ + |r'-/| . 

(S20) 

Given our constraints that < p < r < 1/2, the first term on the right side of (SI 8) is negative 
while the second and third terms are positive. This give us: 

\\K ®Ur- Up®Up\\^ - 2(r - p)(2 - r - p) (S21) 

= 25p{2 -2p- 2Sp) . (S22) 

So in the double-qubit case Popt is: 

Popt^lil + Sp{2-2p-2r)). (S23) 
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If we observe S(20) carefully we find that the terms are distributed binomially. For the case 
where we have N qubits, we can use p = ^/^ ® |00 • • • 0)(00 • • • 0| to maximize the diamond 
norm for uses of BSC to get: 

^ / AT \ 

\\^^r-^^rL = J2i i ) {r^i-rf-^ -P^ii-pf-^l . (S24) 

Depolarizing Channel 

The calculation of the diamond norm of the difference between N uses of two depolarizing 
channels (DC) is similar to the calculation of BSC that we performed in the previous section. 
The expression for the channel is 

J\fpP={l- p)p + {p/3(XpX + YpY + ZpZ) . (S25) 

Eve sees a channel with a somewhat higher rate r = p + Sp. As in the BSC case we assume that 
Q < p < r < 1/2. For N = 2 case the difference between the two depolarizing channels is: 

(AT, ® A/; - A/; ® = ((1 - r)' - (1 - 

+ ((1 - r)(r/3) - (1 - p){p/3)){X,pX, + ■■■ + Z^pZ.,) (S27) 
+ ((r/3)2 - (p/3)2)(XiX2pXiX2 + • • • + Z^Z^pZ^Z^) . 

As in the BSC case we can express the diamond norm as in (S19). The density matrix that 
maximizes the trace norm is p = ip ® |$+)($+|, where |$+) = l/\/2(|00) + |11)), and ip is 
some arbitrary single-qubit density operator. 

\\K ®Nr- K®NX = 1(1 - - (1 - pf\ 

+ 6|(l-r)(r/3)-(l-p)(p/3)| (S28) 
+ 9|(r/3r-(p/3f| 

= \{l-rf -{l-pf\+2\{l-r)r-{l~p)p\ + \r'^ -p^\ . 
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After evaluating the absolute value terms, we get: 



\\Nr®K-Up®Np\\^^2{r-p){2-r-p) 

= 2dp(^2 - 2p - dp^ . (S29) 



So, 

1 1 

2 + 2 



^opt = 7^ + 7^(^p( 2 - 2p - 5p ) . (S30) 



For the general case for N uses of the depolarizing channel we may write the diamond norm as: 

^ / AT \ 

\\^^r-^^rL = T.i i Ir^i-rf-^-p^ii-pf-^l , (ssi) 

which is exactly the same expression as for the BSC. 
Achievable Rate for Protocol 2 

We will work out the simplest example — the BSC in the case where the physical channel is 
noise-free. The errors in the codewords that Alice sends to Bob are binomially distributed. Let 
pN be the mean of this distribution and let the variance be pNS, where < S <^ 1. Here N is 
the length of each of codeword. Let 

Pfc= (S32) 

be the errors that Alice applies to her codewords. For each k from Np(l — 6) to Np(l + 5) 
choose Cfe strings of weight k. Let 

Npil+S) 

C= Yl (S33) 

k=Np{l-5) 

Let these sets of strings be called Sk, and 

S = ^kSk (S34) 
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So the total number of strings in the set S is C. Define the probability q= 1/C. Then we want 

to satisfy qCk = Ck/C = pk- Clearly we must have ('^''f for all k. This implies that: 



^Ckp''{l-pf-''<Ckq 
^p^{l-p)^-^ <q 

We want C to be as large as possible, which means we want q to be as small as possible. This 
constraint then gives us 

^C^l/q 

=^ C ^ p-^pCi-^)^! _p)-^(i-p+P'5) 
The number of bits that Alice can send is, therefore 
M^log^C 

= A^(-pl0g2p- (1 -p)l0g2(l -p) + 5(pl0g2p-pl0g2(l -p))) 

= N{hip) -p51og2((l (S35) 
So with this encoding Alice can send almost Nh{p) bits. 
Diamond norm for protocol 2 

Again we consider the simplest case of the BSC. Let N be sufficiently large so that the total 
probability of the typical errors is > 1 — e, and these typical errors have weight k in the range 
Np{l — 5) <k < Np{l + 5). We divide up all errors of weight k into Ck partitions containing 



Ck \ P 
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errors each. Within each set the errors are all equally likely to be chosen. However, because the 

number of errors is unlikely to divide exactly evenly into Ck sets, the probabilities qk of an error 
of weight k will be slightly different from the probability Pk — p'^{l — p)^~^ of the binomial 
distribution. We can put a (not- very-tight) bound on this difference: 



Plugging this into the expression for the diamond norm, we get 

<lk\ 

k=Np{l-5)+l 



Np{l+S) . . 



1 — 2p J \1 — p J \ 1— p 

which is exponentially small in N. 

Error-correction for protocol 2 with a noisy channel 

Since errors can act in a complicated manner on the space of syndromes, it is not entirely clear 
what the optimal encoding is even for a simple channel. Here we present one encoding for the 
BSC that gives an achievable rate in the limit of large N, but it is quite likely that higher rates 
are possible. 

In the noiseless case, it is possible to use the C{N, M) strings of weight M as a code — each 
string represents one possible weight-M error. If we then apply a BSC with probability p, on 
average Np bits would be flipped. If Np <^ M then one can keep only a subset of the weight-M 
strings, separated by a distance > 2Np. 

This encoding quickly becomes inefficient as p gets larger. Using the shared secret key, 
Alice can instead chose only a subset of the N bits to hold the codewords. If this subset includes 
A^' bits, then the errors on the remaining N — N' bits are irrelevant and do not need to be 
corrected. The limit of this would be similar to encoding 1 in the paper, where N' pa 2M. 
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Let N' — qN for some < q < 1. The number of strings of weight M is C{qN, M), 
and there will be an average number of bit flips pqN on the relevant portion of the codeword. 
Keep a subset of these codewords separated by distance 2pqN. Decoding is done by finding the 
closest codeword to the output string. 

As iV, M — )■ cxD then the number of codewords will go like 

C{N,M,p,q)^jM). 

\pqN) 

The number of bits will be logg C{N,p,q). 

Since g is a parameter we can choose freely, we choose it to maximize the rate A^, M, p, q) = 
(l/N) log2 C{N, M,p, q). Using the Stirling approximation, differentiating with respect to q, 
and setting the result equal to 0, we can solve for q: 

_M f 2'^(f) \ 
^~ TV \2'^{P) - 1 ) ■ 

We can then plug this back into the formula for TZ. If the physical channel has error rate p and 
Alice is attempting to emulate a channel with error rate p + 5p, then M — N6p/ (1 — 2p). This 
gives us the following expression for the rate: 

We can compare this to the rate from encoding 1, which for the BSC is 26p{l — h{p))/ (1 — 2p). 
It is not hard to see that 'R-{p, 5p) above approaches this rate as p — )^ 1/2 (and both rates go to 
zero), but as p — > this encoding does considerably better than encoding 1. It is quite likely, 
however, that there may be even more efficient encodings. 

Shared classical secret key vs. shared ebits 

If Alice and Bob share a secret, random key, they can use the steganographic encodings de- 
scribed in the paper. Shared entanglement (ebits) can act as a resource in the same way — by 
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measuring the two halves of a maximally entangled pair of qubits (|00) + |ll))/v2 Alice and 
Bob can generate a shared secret bit. 

However, the use of ebits does open up an additional possibility beyond what can be done 
with a classical key. Instead of sending quantum information through the channel, Alice can 
instead teleport qubits to Bob. Teleportation consumes one ebit and requires the transmission of 
two classical bits for each qubit teleported. These classical bits can be sent through the channel 
steganographically. Because these bits are perfectly random, no one-time pad or twirling is 
needed. And because they are purely classical information, they are not disrupted if Eve chooses 
to measure the error syndromes, as a general quantum state would be. In this sense, quantum 
steganography with shared ebits is more powerful than quantum steganography with a shared 
classical key. 
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